Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
1.
Rev. cuba. inform. méd ; 15(2)dic. 2023.
Article in Spanish | LILACS-Express | LILACS | ID: biblio-1536285

ABSTRACT

Introducción: Los avances actuales en el campo de las TICs han permitido un importante impulso en el desarrollo de sistemas que traducen texto plano en español en pictogramas. Sin embargo, las soluciones actuales no pueden ser comprendidas por una persona con dificultades del lenguaje en Cuba, debido a que algunas terminologías no están presentes en el lenguaje cotidiano. Objetivo: Desarrollar el modelo Pictobana para el análisis semántico de un Pictotraductor que integre la semántica del lenguaje cubano. Métodos: El modelo fue desarrollado aplicando técnicas de procesamiento del lenguaje natural. Se realiza un análisis lingüístico con el objetivo de proporcionar las mejores representaciones posibles de los textos en pictogramas. Resultados: El modelo es implementado en una aplicación web que proporciona una herramienta que ayuda a promover las competencias y habilidades de comunicación a personas con dificultades del habla en Cuba y a sus familiares. Conclusiones: Las pruebas realizadas mediante experimentos y criterio de expertos, demuestran que el analizador desarrollado, aumenta la ajustabilidad de los pictogramas al contexto y a la semántica, aminorando la incoherencia y la ambigüedad semántica del futuro sistema.


Introduction: Current advances in the field of ICTs have allowed an important boost in the development of systems that allow translating plain text in Spanish into pictograms. However, the current solutions cannot be understood by a person with language difficulties in Cuba because some terminologies are not present in everyday language. Objective: To develop the Pictobana model for the semantic analysis of a Pictotranslator that integrates the semantics of the Cuban language. Methods: The model was developed by applying natural language processing techniques. A linguistic analysis was carried out with the aim of providing the best possible representations of the texts in pictograms. Results: The model is implemented in a web application that provides a tool that helps promote communication skills and abilities for people with speech difficulties and their families in Cuba. Conclusions: The tests carried out through experiments and expert criteria show that the developed analyzer increases the adjustability of the pictograms to the context and the semantics, reducing the incoherence and semantic ambiguity of the future system.

2.
Rev. cuba. inform. méd ; 15(2)dic. 2023.
Article in Spanish | LILACS-Express | LILACS | ID: biblio-1536297

ABSTRACT

El objetivo de este estudio fue describir las percepciones de los usuarios de Facebook que realizaron comentarios, en las publicaciones realizadas desde la cuenta oficial del Ministerio de Salud de Perú (MINSA), referentes a la campaña de vacunación contra el VPH. Se analizaron 2748 comentarios en Python con procesamiento de lenguaje natural. Con este proceso se obtuvieron palabras claves que luego fueron interpretadas de manera manual. Se encontraron mayoritariamente cuatro tipos de discursos dentro de ellos: a) apoyo a la publicación sobre la vacuna contra el VPH; b) rechazo a la vacuna contra el VPH; c) Vacuna contra el VPH en niños; d) Dudas sobre la vacuna contra el VPH. En su mayoría, los usuarios que expresaron una postura de rechazo de esta vacuna se respaldaban de links a noticias donde se presentaba un evento supuestamente atribuido a la vacunación o inmunización pero que carecía de una fuente de información confiable y/o verificable.


The objective of this study was to describe the perceptions of Facebook users who commented on posts made by the official account of the Ministry of Health of Peru (MINSA) regarding the HPV vaccination campaign. We analyzed 2748 comments in Python with natural language processing. With this process we obtained keywords that were then interpreted manually. We found mostly four types of discourse, within them: a) support for the publications of the HPV vaccine; b) refusal of the HPV vaccine; c) HPV vaccine in children; d) doubts about the HPV vaccine. For the most part, users who expressed a position against this vaccine relied on links to online news stories that presented an event supposedly attributed to vaccination or immunization but lacked a reliable and/or verifiable source of information.

3.
Article | IMSEAR | ID: sea-220756

ABSTRACT

This study introduces a technique for leveraging sentiment analysis to detect potential suicide risk among social media users. Our approach utilizes machine learning to scrutinize the textual content of social media posts and identify signicant markers of suicidal behavior. Our methodology comprises data collection, data preprocessing, data labeling, machine learning model training, and model testing. The effectiveness of our approach is assessed using precision, recall, and F1 score metrics. The outcome of our evaluation demonstrates that our method is adept at detecting individuals who may be at risk of suicide on social media, yielding an impressive F1 score of 0.85.

4.
Journal of China Pharmaceutical University ; (6): 282-293, 2023.
Article in Chinese | WPRIM | ID: wpr-987644

ABSTRACT

@#In recent years, artificial intelligence (AI) has been widely applied in the field of drug discovery and development.In particular, natural language processing technology has been significantly improved after the emergence of the pre-training model.On this basis, the introduction of graph neural network has also made drug development more accurate and efficient.In order to help drug developers more systematically and comprehensively understand the application of artificial intelligence in drug discovery, this article introduces cutting-edge algorithms in AI, and elaborates on the various applications of AI in drug development, including drug small molecule design, virtual screening, drug repurposing, and drug property prediction, finally discusses the opportunities and challenges of AI in future drug development.

5.
Cad. Saúde Pública (Online) ; 39(11): e00243722, 2023. tab, graf
Article in Portuguese | LILACS-Express | LILACS | ID: biblio-1550174

ABSTRACT

Os pacientes com síndrome pós-COVID-19 se beneficiam de programas de promoção de saúde e sua rápida identificação é importante para a utilização custo efetiva desses programas. Técnicas tradicionais de identificação têm fraco desempenho, especialmente em pandemias. Portanto, foi realizado um estudo observacional descritivo utilizando 105.008 autorizações prévias pagas por operadora privada de saúde com aplicação de método não supervisionado de processamento de linguagem natural por modelagem de tópicos para identificação de pacientes suspeitos de infecção por COVID-19. Foram gerados seis modelos: três utilizando o algoritmo BERTopic e três modelos Word2Vec. O modelo BERTopic cria automaticamente grupos de doenças. Já no modelo Word2Vec, para definição dos tópicos relacionados a COVID-19, foi necessária análise manual dos 100 primeiros casos de cada tópico. O modelo BERTopic com mais de 1.000 autorizações por tópico sem tratamento de palavras selecionou pacientes mais graves - custo médio por autorizações prévias pagas de BRL 10.206 e gasto total de BRL 20,3 milhões (5,4%) em 1.987 autorizações prévias (1,9%). Teve 70% de acerto comparado à análise humana e 20% de casos com potencial interesse, todos passíveis de análise para inclusão em programa de promoção à saúde. Teve perda importante de casos quando comparado ao modelo tradicional de pesquisa com linguagem estruturada e identificou outros grupos de doenças - ortopédicas, mentais e câncer. O modelo BERTopic serviu como método exploratório a ser utilizado na rotulagem de casos e posterior aplicação em modelos supervisionados. A identificação automática de outras doenças levanta questionamentos éticos sobre o tratamento de informações em saúde por aprendizado de máquina.


Los pacientes con síndrome pos-COVID-19 pueden beneficiarse de los programas de promoción de la salud. Su rápida identificación es importante para el uso efectivo de estos programas. Las técnicas de identificación tradicionales no tienen un buen desempeño, especialmente en pandemias. Se realizó un estudio observacional descriptivo, con el uso de 105.008 autorizaciones previas pagadas por un operador de salud privado mediante la aplicación de un método no supervisado de procesamiento del lenguaje natural mediante modelado temático para identificar a los pacientes sospechosos de estar infectados por COVID-19. Se generaron 6 modelos: 3 con el uso del algoritmo BERTopic y 3 modelos Word2Vec. El modelo BERTopic crea automáticamente grupos de enfermedades. En el modelo Word2Vec para definir temas relacionados con la COVID-19, fue necesario el análisis manual de los primeros 100 casos de cada tema. El modelo BERTopic con más de 1.000 autorizaciones por tema sin tratamiento de palabras seleccionó a pacientes más graves: costo promedio por autorizaciones previas pagada de BRL 10.206 y gasto total de BRL 20,3 millones (5,4%) en 1.987 autorizaciones previas (1,9%). Además, contó con el 70% de aciertos en comparación con el análisis humano y el 20% de los casos con potencial interés, todos los cuales pueden analizarse para su inclusión en un programa de promoción de la salud. Hubo una pérdida significativa de casos en comparación con el modelo tradicional de investigación con lenguaje estructurado y se identificó otros grupos de enfermedades: ortopédicas, mentales y cáncer. El modelo BERTopic sirvió como un método exploratorio para ser utilizado en el etiquetado de casos y su posterior aplicación en modelos supervisados. La identificación automática de otras enfermedades plantea preguntas éticas sobre el tratamiento de la información de salud mediante el aprendizaje de máquina.


Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.

6.
Rev. Assoc. Med. Bras. (1992, Impr.) ; 69(10): e20230848, 2023. graf
Article in English | LILACS-Express | LILACS | ID: biblio-1514686

ABSTRACT

SUMMARY OBJECTIVE: The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination. METHODS: A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis. RESULTS: In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups. CONCLUSION: ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.

7.
Article in Spanish | LILACS-Express | LILACS | ID: biblio-1536244

ABSTRACT

El análisis de sentimientos o minería de opiniones es una rama de la computación que permite analizar opiniones, sentimientos y emociones en ciertas áreas de interés social como productos, servicios, organizaciones, compañías, eventos y temas de interés actual. En tal sentido se propuso identificar los sentimientos y tópicos presentes en los tweets que hicieron mención a las vacunas cubanas Soberana 02 y Abdala en la red social Twitter. Se optó por los lenguajes de programación Python y R con sus librerías específicas para la ciencia de datos. La primera parte del estudio, que abarcó desde el web scraping hasta la cuantificación de las palabras más usadas, se realizó con Python y las siguientes librerías: tweepy, pandas, re, nltk y matplotlib. Mientras que la segunda, que fue la del análisis de sentimientos y detección de tópicos, se implementó con R y se utilizó: tokenizers, tm, syuzhet, topic modeling, tidyverse, barplot y wordcloud. Se obtuvo que entre los términos con que más se dialoga en Twitter están dosis, vacunas, eficacia, cubanos, candidatos, millones, país, personas, recibido y población. En los tweets las emociones predominantes fueron el miedo y, ligeramente por encima, la confianza; en la polaridad predominó la positiva, como expresión del contexto vivido en el cual se desarrolló la campaña de vacunación. A partir de los tópicos identificados y los términos que se relacionaron con las emociones predominantes, así como por la polaridad, se aprecia consenso en torno a las vacunas Soberana 02 y Abdala.


Sentiment analysis or opinion mining is a branch of computing that allows analyzing opinions, feelings and emotions in certain areas of social interest such as products, services, organizations, companies, events and topics of current interest. In this sense, the objective of this paper was to identify the feelings and topics present in the tweets mentioning the Cuban vaccines Soberana 02 and Abdala on Twitter social network. The programming languages Python and R with their specific libraries for data science were chosen. The first part of the study, which ranged from web scraping to the quantification of the most used words, was carried out with Python and the libraries tweepy, pandas, re, nltk and matplotlib. While the second, which was the sentiment analysis and topic detection, was implemented with R and used tokenizers, tm, syuzhet, topic modeling, tidyverse, barplot, and wordcloud. It was obtained that among the terms with which there is more dialogue on Twitter are doses, vaccines, efficacy, Cubans, candidates, millions, country, people, received and population. In the tweets, the predominant emotions were fear and confidence, slightly above it; in the polarity, the positive one predominated, as an expression of the lived context in which the vaccination campaign was developed. A consensus can be perceived around the vaccines Soberana 02 and Abdala, from the identified topics and the terms that were related to the predominant emotions, as well as the polarity.

8.
CoDAS ; 35(1): e20210250, 2023. tab
Article in English | LILACS-Express | LILACS | ID: biblio-1404347

ABSTRACT

ABSTRACT Purpose The purpose of this pilot study was to explore the home language environment and language outcome of Brazilian toddlers who were hard of hearing, (HH) and controls with typical hearing (TH), and investigate the reliability of using the LENA recording system within a Brazilian Portuguese context. Methods Fourteen families participated in the study (seven children who were HH and seven controls with TH. Each family contributed with one all-day recording. A smaller portion of the recordings of the typically hearing toddlers were manually transcribed by two transcribers. An interrater agreement was conducted, and then the human transcript results were compared against the LENA-generated data for three measures: Adult Words (AW), Child Vocalizations (CV) and Conversational Turns (CT). Results Data analyses revealed a moderate to strong interrater agreement for CV and AW. Weak to moderate agreement was found between the LENA estimates and the means of the human counts for CV and AW. Seemingly, LENA overestimated human counts for AW and underestimated numbers of CV. Comparative analysis suggested similarities in the language and listening environment of the two groups (TH vs. HoH). Children's language development was supported by higher numbers of parent-child interactions (CT). Conclusion The findings imply that LENA may contribute as an ecologically valid tool in preventive family-centered intervention programs for Brazilian toddlers who are hard of hearing and their families, although further validation studies are needed.


RESUMO Objetivo O objetivo deste estudo piloto foi explorar o ambiente da língua doméstica e os resultados linguísticos de crianças brasileiras com deficiência auditiva comparando com crianças ouvintes e investigar a confiabilidade do uso do sistema de registro LENA no contexto do português brasileiro. Método Quatorze famílias participaram do estudo (sete com deficiência auditiva e sete controles com audição típica). Cada família contribuiu com uma gravação durante o tempo de vigilia. Uma parte menor das gravações das crianças com audição normal foi transcrita manualmente por dois transcritores. Um acordo entre avaliadores foi realizado e, em seguida, os resultados da transcrição humana foram comparados com os dados gerados pelo LENA para três medidas: Palavras de Adultos (PA), Vocalizações Infantis (VI) e Turnos de Conversação (TC). Resultados As análises de dados revelaram uma concordância entre avaliadores moderada a forte para VI e PA. Foi encontrada concordância de fraca a moderada entre as estimativas de LENA e as médias das contagens humanas para VI e AW. Aparentemente LENA superestimou contagens humanas para PA e subestimou números de VI. A análise comparativa sugeriu semelhanças na linguagem e no ambiente auditivo dos dois grupos. O desenvolvimento da linguagem das crianças foi apoiado por um maior número de interações pais-filhos (TC). Conclusão Os achados sugerem que o LENA pode contribuir como uma ferramenta ecologicamente válida em programas de intervenção preventiva centrada na família para crianças brasileiras com deficiência auditiva e suas famílias, embora mais estudos de validação sejam necessários.

9.
Texto & contexto enferm ; 32: e20220136, 2023. graf
Article in English | LILACS-Express | LILACS, BDENF | ID: biblio-1432481

ABSTRACT

ABSTRACT Objective: to describe the development of a virtual assistant as a potential tool for health co-production in coping with COVID-19. Method: this is an applied technological production research study developed in March and April 2020 in five stages: 1) literature review, 2) content definition, 3) elaboration of the dialog, 4) test of the prototype, and 5) integration with the social media page. Results: the literature review gathered diverse scientific evidence about the disease based on the Brazilian Ministry of Health publications and by consulting scientific articles. The content was built from the questions most asked by the population, in March 2020, evidenced by Google Trends, in which the following topics emerged: concept of the disease, prevention means, transmission of the disease, main symptoms, treatment modalities, and doubts. Elaboration of the dialog was based on Natural Language Processing, intentions, entities and dialog structure. The prototype was tested in a laboratory with a small number of user computers on a local network to verify the functionality of the set of apps, technical and visual errors in the dialog, and whether the answers provided were in accordance with the user's question, answering the questions correctly and integrated into Facebook. Conclusion: the virtual assistant proved to be a health education tool with potential to combat "Fake News". It also represents a patient-centered form of health communication that favors the strengthening of the bond and interaction between health professionals and patients, promoting co-production in health.


RESUMEN Objetivo: describir el desarrollo de un asistente virtual como posible herramienta para la co-producción en salud a fin de hacer frente al COVID-19. Método: trabajo de investigación aplicado de producción tecnológica, desarrollado en marzo y abril de 2020 en cinco etapas: 1) revisión de la literatura, 2) definición del contenido, 3) elaboración del diálogo, 4) prueba del prototipo y 5) integración con la página web del medio social. Resultados: en la revisión de la literatura se reunieron evidencias científicas sobre la enfermedad a partir de las publicaciones del Ministerio de Salud de Brasil, al igual que sobre la base de consultas en artículos científicos. El contenido se elaboró a partir de las preguntas más frecuentes de la población, en marzo de 2020, puestas en evidencia por medio de Google Trends, donde surgieron los siguientes temas: concepto de la enfermedad, formas de prevención, transmisión de la enfermedad, principales síntomas, modalidades de tratamiento y dudas. La elaboración del diálogo se basó en el Procesamiento de Lenguaje Natural, en intenciones, en entidades y en la estructura del diálogo. El prototipo se puso a prueba en un laboratorio con una cantidad reducida de computadoras usuario en una red local para verificar la funcionalidad del conjunto de aplicaciones, errores técnicos y visuales acerca del diálogo, y si las respuestas proporcionadas estaban de acuerdo con la pregunta del usuario, respondiendo correctamente los interrogantes e integrado a Facebook. Conclusión: el asistente virtual demostró ser una herramienta de educación en salud con potencial para combatir Fake News. También representa una forma de comunicación en salud centrada en el paciente que favorece el fortalecimiento del vínculo y la interacción entre profesionales de la salud y pacientes, promoviendo así la coproducción en salud.


RESUMO Objetivo: descrever o desenvolvimento de um assistente virtual como ferramenta potencial para a coprodução em saúde no enfrentamento à COVID-19. Método: trata-se de uma pesquisa aplicada de produção tecnológica, desenvolvida nos meses de março e abril de 2020 em cinco etapas: 1) revisão de literatura, 2) definição de conteúdo, 3) construção do diálogo, 4) teste do protótipo e 5) integração com página de mídia social. Resultados: a revisão de literatura reuniu evidências científicas sobre a doença a partir das publicações do Ministério da Saúde, no Brasil, e de consultas em artigos científicos. O conteúdo foi construído a partir das perguntas mais realizadas pela população, em março de 2020, evidenciadas por meio do Google Trends, em que emergiram os seguintes temas: conceito da doença, formas de prevenção, transmissão da doença, principais sintomas, formas de tratamento e dúvidas. A construção do diálogo foi baseada em Processamento de Linguagem Natural, intenções, entidades e estrutura de diálogo. O protótipo foi testado em laboratório com um número reduzido de computadores usuários em uma rede local para verificar a funcionalidade do conjunto de aplicações, erros técnicos e visuais acerca do diálogo e se as respostas fornecidas estavam de acordo com a pergunta do usuário, respondendo de forma correta os questionamentos e integrado ao Facebook. Conclusão: o assistente virtual mostrou-se uma ferramenta de educação em saúde e com potencial para combater fake news. Também representa uma forma de comunicação em saúde centrada no paciente, que favorece o fortalecimento de vínculo e interação entre profissionais de saúde e pacientes, promovendo a coprodução em saúde.

10.
China Pharmacy ; (12): 2409-2413, 2023.
Article in Chinese | WPRIM | ID: wpr-996400

ABSTRACT

OBJECTIVE To establish the drug-induced liver injury (DILI) surveillance and assessment system (DILI-SAS), and to improve the diagnostic efficiency of clinical DILI. METHODS The DILI-SAS was constructed by using natural language processing technology to mine and utilize all inpatient medical record data, and combined with Roussel Uclaf causality assessment method (RUCAM). The medical records of 19 445 hospitalized patients from August 2022 to January 2023 were detected to verify the performance of the system and manually analyze the basic data of patients with DILI and the distribution of the first suspected drugs. RESULTS The overall accuracy rate of the DILI-SAS system was 91.95%, and the recall rate was 93.20%. Seventy-five DILI cases were detected, and the DILI incidence rate was 385.70/100 000 people. The efficiency of DILI monitoring by human- computer coupling was increased by about 60 times of manual monitoring; males (61.33%) and patients over 60 years old (56.00%) were the most common in the 75 cases of DILI. The clinical type of liver injury was hepatocyte injury (69.33%), the incubation period was mainly 5-90 days after treatment (62.67%), and the RUCAM score between 3 and 5 was the most common (66.67%); pharmacological distribution of the first suspected drugs was mainly dihydropyridines, HMG CoA reductase inhibitors, proton pump inhibitors, etc. The specific drugs were atorvastatin, omeprazole, ceftriaxone, metronidazole and other drugs. CONCLUSIONS The establishment of DILI-SAS can improve the evaluation efficiency on the basis of ensuring the accuracy degree, and provide a solution for the early identification, diagnosis and evaluation of clinical DILI.

11.
Chinese Journal of Physical Medicine and Rehabilitation ; (12): 592-597, 2023.
Article in Chinese | WPRIM | ID: wpr-995223

ABSTRACT

Objective:To automatically and rapidly detect mild cognitive impairment (MCI) in an objective manner using natural language processing (NLP).Methods:A total of 215 participants (half female) aged 50 to 80 were recruited for the study′s normal cognition and MCI groups. Speech tasks and the mini mental state examination (MMSE-2) were used to collect audio data and quantify cognitive functioning. Altogether 162 acoustic features were extracted including the speaking speed, syllable number, syllable duration, number of pauses, duration of pauses, the standard deviation of formant frequency and sound pressure variation. They were compared between the two groups and genders. Multiple regression analysis was used to formulate a model predicting MCI. The sensitivity, specificity and accuracy of its predictions were used to evaluate its predictive power.Results:There were significant differences between the two groups in 50 acoustic features including their pronunciation rhythm and pronunciation accuracy. Univariate correlation analysis revealed that the pronunciation rhythm was significantly associated with cognitive functioning. The sensitivity, specificity and accuracy of the model were 0.54, 0.80 and 0.69 for males and 0.00, 0.86 and 0.63 for females.Conclusion:MCI greatly affects pronunciation rhythm. Acoustic analysis based on NLP can detect MCI rapidly and objectively.

12.
Article in Spanish | LILACS, CUMED | ID: biblio-1408108

ABSTRACT

Este artículo tuvo como propósito caracterizar el texto libre disponible en una historia clínica electrónica de una institución orientada a la atención de pacientes en embarazo. La historia clínica electrónica, más que ser un repositorio de datos, se ha convertido en un sistema de soporte a la toma de decisiones clínicas. Sin embargo, debido al alto volumen de información y a que parte de la información clave de las historias clínicas electrónicas está en forma de texto libre, utilizar todo el potencial que ofrece la información de la historia clínica electrónica para mejorar la toma de decisiones clínicas requiere el apoyo de métodos de minería de texto y procesamiento de lenguaje natural. Particularmente, en el área de Ginecología y Obstetricia, la implementación de métodos del procesamiento de lenguaje natural podría ayudar a agilizar la identificación de factores asociados al riesgo materno. A pesar de esto, en la literatura no se registran trabajos que integren técnicas de procesamiento de lenguaje natural en las historias clínicas electrónicas asociadas al seguimiento materno en idioma español. En este trabajo se obtuvieron 659 789 tokens mediante los métodos de minería de texto, un diccionario con palabras únicas dado por 7 334 tokens y se estudiaron los n-grams más frecuentes. Se generó una caracterización con una arquitectura de red neuronal CBOW (continuos bag of words) para la incrustación de palabras. Utilizando algoritmos de clustering se obtuvo evidencia que indica que palabras cercanas en el espacio de incrustación de 300 dimensiones pueden llegar a representar asociaciones referentes a tipos de pacientes, o agrupar palabras similares, incluyendo palabras escritas con errores ortográficos. El corpus generado y los resultados encontrados sientan las bases para trabajos futuros en la detección de entidades (síntomas, signos, diagnósticos, tratamientos), la corrección de errores ortográficos y las relaciones semánticas entre palabras para generar resúmenes de historias clínicas o asistir el seguimiento de las maternas mediante la revisión automatizada de la historia clínica electrónica(AU)


The purpose of this article was to characterize the free text available in an electronic health record of an institution, directed at the care of patients in pregnancy. More than being a data repository, the electronic health record (HCE) has become a clinical decision support system (CDSS). However, due to the high volume of information, as some of the key information in EHR is in free text form, using the full potential that EHR information offers to improve clinical decision-making requires the support of methods of text mining and natural language processing (PLN). Particularly in the area of gynecology and obstetrics, the implementation of PLN methods could help speed up the identification of factors associated with maternal risk. Despite this, in the literature there are no papers that integrate PLN techniques in EHR associated with maternal follow-up in Spanish. Taking into account this knowledge gap, in this work a corpus was generated and characterized from the EHRs of a gynecology and obstetrics service characterized by treating high-risk maternal patients. PLN and text mining methods were implemented on the data, obtaining 659 789 tokens and a dictionary with unique words given by 7 334 tokens. The characterization of the data was developed from the identification of the most frequent words and n-grams and a vector representation of embedding words in a 300-dimensional space was performed using a CBOW (Continuous Bag of Words) neural network architecture. The embedding of words allowed to verify by means of Clustering algorithms, that the words associated to the same group can come to represent associations referring to types of patients, or group similar words, including words written with spelling errors. The corpus generated and the results found lay the foundations for future work in the detection of entities (symptoms, signs, diagnoses, treatments), correction of spelling errors and semantic relationships between words to generate summaries of medical records or assist the follow-up of mothers through the automated review of the electronic health record(AU)


Subject(s)
Humans , Female , Pregnancy , Natural Language Processing , Electronic Health Records
13.
Rev. méd. Chile ; 149(7): 1014-1022, jul. 2021. ilus, graf
Article in Spanish | LILACS | ID: biblio-1389546

ABSTRACT

Background: A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. Aim: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. Material and Methods: A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator's agreement during their training. Results: An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. Conclusions: A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.


Subject(s)
Humans , Electronic Data Processing , Chile
14.
Chinese Journal of School Health ; (12): 465-470, 2021.
Article in Chinese | WPRIM | ID: wpr-875721

ABSTRACT

Abstract@#The possible mechanisms of developmental dyslexia mainly include the hypothesis of language framework and the hypothesis of non-verbal framework. The language framework assumes that people with developmental dyslexia may exhibit defects in phonetic awareness, rapid naming, phonetic memory, and orthographic processing. Studies of developmental dyslexia in Chinese have found that deficiencies in orthography may be an important cause of dyslexia, but there are diverse views and opinions regarding orthography processing. This article sorts out the research progress in behavioral and neuroimaging aspects of orthography studies, and provides references for further development of processing test materials and methods in the research of processing mechanism of developmental dyslexia orthography.

15.
Journal of Biomedical Engineering ; (6): 105-110, 2021.
Article in Chinese | WPRIM | ID: wpr-879255

ABSTRACT

Subject recruitment is a key component that affects the progress and results of clinical trials, and generally conducted with eligibility criteria (includes inclusion criteria and exclusion criteria). The semantic category analysis of eligibility criteria can help optimizing clinical trials design and building automated patient recruitment system. This study explored the automatic semantic categories classification of Chinese eligibility criteria based on artificial intelligence by academic shared task. We totally collected 38 341 annotated eligibility criteria sentences and predefined 44 semantic categories. A total of 75 teams participated in competition, with 27 teams having submitted system outputs. Based on the results, we found out that most teams adopted mixed models. The mainstream resolution was applying pre-trained language models capable of providing rich semantic representation, which were combined with neural network models and used to fine-tune the models with reference to classifier tasks, and finally improved classification performance could be obtained by ensemble modeling. The best-performing system achieved a macro


Subject(s)
Humans , Artificial Intelligence , China , Language , Natural Language Processing , Neural Networks, Computer
16.
Rev. méd. Chile ; 147(10): 1229-1238, oct. 2019. tab, graf
Article in Spanish | LILACS | ID: biblio-1058589

ABSTRACT

Background: Free-text imposes a challenge in health data analysis since the lack of structure makes the extraction and integration of information difficult, particularly in the case of massive data. An appropriate machine-interpretation of electronic health records in Chile can unleash knowledge contained in large volumes of clinical texts, expanding clinical management and national research capabilities. Aim: To illustrate the use of a weighted frequency algorithm to find keywords. This finding was carried out in the diagnostic suspicion field of the Chilean specialty consultation waiting list, for diseases not covered by the Chilean Explicit Health Guarantees plan. Material and Methods: The waiting lists for a first specialty consultation for the period 2008-2018 were obtained from 17 out of 29 Chilean health services, and total of 2,592,925 diagnostic suspicions were identified. A natural language processing technique called Term Frequency-Inverse Document Frequency was used for the retrieval of diagnostic suspicion keywords. Results: For each specialty, four key words with the highest weighted frequency were determined. Word clouds showing words weighted by their importance were created to obtain a visual representation. These are available at cimt.uchile.cl/lechile/. Conclusions: The algorithm allowed to summarize unstructured clinical free-text data, improving its usefulness and accessibility.


Subject(s)
Humans , Natural Language Processing , Electronic Data Processing/methods , Medical Records , Information Storage and Retrieval/methods , Diagnostic Techniques and Procedures , Data Mining/methods , Referral and Consultation/statistics & numerical data , Time Factors , Medical Informatics Computing , Chile , Reproducibility of Results , Medicine
17.
Healthcare Informatics Research ; : 99-105, 2019.
Article in English | WPRIM | ID: wpr-740235

ABSTRACT

OBJECTIVES: This study analyzed the health technology trends and sentiments of users using Twitter data in an attempt to examine the public's opinions and identify their needs. METHODS: Twitter data related to health technology, from January 2010 to October 2016, were collected. An ontology related to health technology was developed. Frequently occurring keywords were analyzed and visualized with the word cloud technique. The keywords were then reclassified and analyzed using the developed ontology and sentiment dictionary. Python and the R program were used for crawling, natural language processing, and sentiment analysis. RESULTS: In the developed ontology, the keywords are divided into ‘health technology‘ and ‘health information‘. Under health technology, there are are six subcategories, namely, health technology, wearable technology, biotechnology, mobile health, medical technology, and telemedicine. Under health information, there are four subcategories, namely, health information, privacy, clinical informatics, and consumer health informatics. The number of tweets about health technology has consistently increased since 2010; the number of posts in 2014 was double that in 2010, which was about 150 thousand posts. Posts about mHealth accounted for the majority, and the dominant words were ‘care‘, ‘new‘, ‘mental‘, and ‘fitness‘. Sentiment analysis by subcategory showed that most of the posts in nearly all subcategories had a positive tone with a positive score. CONCLUSIONS: Interests in mHealth have risen recently, and consequently, posts about mHealth were the most frequent. Examining social media users' responses to new health technology can be a useful method to understand the trends in rapidly evolving fields.


Subject(s)
Biomedical Technology , Biotechnology , Boidae , Data Mining , Informatics , Medical Informatics , Methods , Natural Language Processing , Privacy , Public Opinion , Social Media , Telemedicine
18.
Healthcare Informatics Research ; : 305-312, 2019.
Article in English | WPRIM | ID: wpr-763951

ABSTRACT

OBJECTIVES: Triage is a process to accurately assess and classify symptoms to identify and provide rapid treatment to patients. The Korean Triage and Acuity Scale (KTAS) is used as a triage instrument in all emergency centers. The aim of this study was to train and compare machine learning models to predict KTAS levels. METHODS: This was a cross-sectional study using data from a single emergency department of a tertiary university hospital. Information collected during triage was used in the analysis. Logistic regression, random forest, and XGBoost were used to predict the KTAS level. RESULTS: The models with the highest area under the receiver operating characteristic curve (AUROC) were the random forest and XGBoost models trained on the entire dataset (AUROC = 0.922, 95% confidence interval 0.917–0.925 and AUROC = 0.922, 95% confidence interval 0.918–0.925, respectively). The AUROC of the models trained on the clinical data was higher than that of models trained on text data only, but the models trained on all variables had the highest AUROC among similar machine learning models. CONCLUSIONS: Machine learning can robustly predict the KTAS level at triage, which may have many possibilities for use, and the addition of text data improves the predictive performance compared to that achieved by using structured data alone.


Subject(s)
Humans , Cross-Sectional Studies , Dataset , Emergencies , Emergency Service, Hospital , Forests , Logistic Models , Machine Learning , Natural Language Processing , ROC Curve , Triage
19.
Genomics & Informatics ; : e15-2019.
Article in English | WPRIM | ID: wpr-763809

ABSTRACT

Automatically detecting mentions of pharmaceutical drugs and chemical substances is key for the subsequent extraction of relations of chemicals with other biomedical entities such as genes, proteins, diseases, adverse reactions or symptoms. The identification of drug mentions is also a prior step for complex event types such as drug dosage recognition, duration of medical treatments or drug repurposing. Formally, this task is known as named entity recognition (NER), meaning automatically identifying mentions of predefined entities of interest in running text. In the domain of medical texts, for chemical entity recognition (CER), techniques based on hand-crafted rules and graph-based models can provide adequate performance. In the recent years, the field of natural language processing has mainly pivoted to deep learning and state-of-the-art results for most tasks involving natural language are usually obtained with artificial neural networks. Competitive resources for drug name recognition in English medical texts are already available and heavily used, while for other languages such as Spanish these tools, although clearly needed were missing. In this work, we adapt an existing neural NER system, NeuroNER, to the particular domain of Spanish clinical case texts, and extend the neural network to be able to take into account additional features apart from the plain text. NeuroNER can be considered a competitive baseline system for Spanish drug and CER promoted by the Spanish national plan for the advancement of language technologies (Plan TL).


Subject(s)
Drug Repositioning , Learning , Machine Learning , Natural Language Processing , Neural Networks, Computer , Neurons , Running
20.
Genomics & Informatics ; : e17-2019.
Article in English | WPRIM | ID: wpr-763807

ABSTRACT

Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.


Subject(s)
Benchmarking , Biology , Data Mining , Dataset , Machine Learning , Methods , Molecular Biology , Natural Language Processing , Oryza , Plants
SELECTION OF CITATIONS
SEARCH DETAIL